Search CORE

28 research outputs found

The Flexible Group Spatial Keyword Query

Author: D Papadias
G Cong
GR Hjaltason
K Yao
ME Ali
N Roussopoulos
X Cao
Z Li
Publication venue
Publication date: 24/04/2017
Field of study

We present a new class of service for location based social networks, called the Flexible Group Spatial Keyword Query, which enables a group of users to collectively find a point of interest (POI) that optimizes an aggregate cost function combining both spatial distances and keyword similarities. In addition, our query service allows users to consider the tradeoffs between obtaining a sub-optimal solution for the entire group and obtaining an optimimized solution but only for a subgroup. We propose algorithms to process three variants of the query: (i) the group nearest neighbor with keywords query, which finds a POI that optimizes the aggregate cost function for the whole group of size n, (ii) the subgroup nearest neighbor with keywords query, which finds the optimal subgroup and a POI that optimizes the aggregate cost function for a given subgroup size m (m <= n), and (iii) the multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding POIs for each of the subgroup sizes in the range [m, n]. We design query processing algorithms based on branch-and-bound and best-first paradigms. Finally, we provide theoretical bounds and conduct extensive experiments with two real datasets which verify the effectiveness and efficiency of the proposed algorithms.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Bulk Insertions into xBR+ -trees

Author: G Roumelis
G Roumelis
GR Hjaltason
L Arge
L Chen
R Choubey
S Shekhar
T Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Bulk insertion refers to the process of updating an existing index by inserting a large batch of new data, treating the items of this batch as a whole and not by inserting these items one-by-one. Bulk insertion is related to bulk loading, which refers to the process of creating a non-existing index from scratch, when the dataset to be indexed is available beforehand. The xBR + -tree is a balanced, disk-resident, Quadtree-based index for point data, which is very efficient for processing spatial queries. In this paper, we present the first algorithm for bulk insertion into xBR+ -trees. This algorithm incorporates extensions of techniques that we have recently developed for bulk loading xBR+ -trees. Moreover, using real and artificial datasets of various cardinalities, we present an experimental comparison of this algorithm vs. inserting items one-by-one for updating xBR+ -trees, regarding performance (I/O and execution time) and the characteristics of the resulting trees. We also present experimental results regarding the query-processing efficiency of xBR+ -trees built by bulk insertions vs. xBR+ -trees built by inserting items one-by-one

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

Author: B Naidan
DD Lewis
DM Blei
DW Jacobs
E Chávez
G Chechik
GR Hjaltason
GT Toussaint
H Samet
L Boytsov
M Aumüller
S Kullback
S Robertson
T Skopal
Y Malkov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/10/2019
Field of study

We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

arXiv.org e-Print Archive

Crossref

Algorithms for Constrained k-Nearest Neighbor Queries over Moving Object Trajectories

Author: Baihua Zheng
E Frentzos
Gencai Chen
GR Hjaltason
K Mouratidis
K Raptopoulou
KL Cheung
MF Mokbel
Qing Li
R Benetis
Y Gao
Yunjun Gao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2010
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Using metric space indexing for complete and efficient record linkage

Author: A Reid
B Ramadan
C Li
D Hand
G Papadakis
GR Hjaltason
H Newcombe
IP Fellegi
L Bo
P Christen
P Christen
P Zezula
Q Wang
R Connor
R Connor
RC Steorts
V Levenshtein
XL Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Record linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Recommended from our members

Anonymisation of geographical distance matrices via Lipschitz embedding

Author: AS Whittemore
BS Everitt
CD Lloyd
DR Helsel
G Duncan
GR Hjaltason
GT Duncan
H-W Jung
J Bourgain
J Höhne
J Konc
JJ Trinckes
K Emam El
K Emam El
K Emam El
K Emam El
K Emam El
K Kenthapadi
K Riesen
KC Clarke
KH Hampton
L Sweeney
LA Waller
M Kroll
Martin Kroll
MM Merener
MP Armstrong
MP Gutmann
Rainer Schnell
RS Bivand
S Dray
SC Wieland
T Dalenius
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Anonymisation of spatially referenced data has received increasing attention in recent years. Whereas the research focus has been on the anonymisation of point locations, the disclosure risk arising from the publishing of inter-point distances and corresponding anonymisation methods have not been studied systematically. METHODS: We propose a new anonymisation method for the release of geographical distances between records of a microdata file-for example patients in a medical database. We discuss a data release scheme in which microdata without coordinates and an additional distance matrix between the corresponding rows of the microdata set are released. In contrast to most other approaches this method preserves small distances better than larger distances. The distances are modified by a variant of Lipschitz embedding. RESULTS: The effects of the embedding parameters on the risk of data disclosure are evaluated by linkage experiments using simulated data. The results indicate small disclosure risks for appropriate embedding parameters. CONCLUSION: The proposed method is useful if published distance information might be misused for the re-identification of records. The method can be used for publishing scientific-use-files and as an additional tool for record-linkage studies

City Research Online

Crossref

Springer - Publisher Connector

PubMed Central

Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance

Author: GB Hall
GR Hjaltason
J Song
Jae-Gil Lee
Jun-Sung Kim
K Whang
K Whang
Ki-Hoon Lee
Kyu-Young Whang
M Lee
Min-Jae Lee
Min-Soo Kim
S Banerjee
Wook-Shin Han
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Author: AGK Janacek
André Skupin
BC Vanteru
Bob Schijvenaars
Colin Allen
David Newman
DJ Newman
DK Harman
DM Blei
EM Voorhees
EP Jiang
F Janssens
G Gorrell
G Salton
GL Poulter
GR Hjaltason
HM Müller
J Lewis
J Lin
J Lin
Joseph R. Biberstine
K Börner
K Järvelin
K Sparck Jones
K Sparck Jones
Katy Börner
Kevin W. Boyack
KW Boyack
KW Boyack
KW Boyack
MA Hearst
MD Cao
Michael Patek
MW Berry
N Jardine
Nianli Ma
NJ Belkin
P Ahlgren
P Ahlgren
P Calado
P Castells
R Kassab
R Klavans
Richard Klavans
Russell J. Duhon
S Deerwester
S Martin
SE Robertson
T Couto
T Hofmann
T Kohonen
T Kohonen
T Theodosiou
TG Kolda
TK Landauer
WS Cooper
Y Aphinyanaphongs
Y Yamamoto
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

Public Library of Science (PLOS)

Crossref

IUScholarWorks (University of Indiana)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

SHIATSU: tagging and retrieving videos without worries

Author: A Dorado
AG Hauptmann
AG Hauptmann
AWM Smeulders
C-W Su
Corrado Romani
GR Hjaltason
H Zhao
I Bartolini
Ilaria Bartolini
J Canny
J Yuan
L Wang
MA Hearst
Marco Patella
MS Lew
N Rasiwasia
P Geetha
PY Liu
R Datta
R Kasturi
T Barbu
TN Shanmugam
V Chasanis
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Efficient general spatial skyline computation

Author: GR Hjaltason
JB Rocha-Junior
Qianlu Lin
Wenjie Zhang
Xuemin Lin
Ying Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref